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Abstract. Cortex is an automatic generic document summarization 
system. To select the most relevant sentences of a document, it uses 
an optimal decision algorithm that combines several metrics. The met- 
rics processes, weighting and extract pertinence sentences by statistical 
and informational algorithms. This technique might improve a Question- 
Answering system, whose function is to provide an exact answer to a 
question in natural language. In this paper, we present the results ob- 
tained by coupling the Cortex summarizer with a Question- Answering 
system (QAAS). Two configurations have been evaluated. In the first one, 
a low compression level is selected and the summarization system is only 
used as a noise filter. In the second configuration, the system actually 
functions as a summarizer, with a very high level of compression. Our 
results on French corpus demonstrate that the coupling of Automatic 
Summarization system with a Question- Answering system is promising. 
Then the system has been adapted to generate a customized summary 
depending on the specific question. Tests on a french multi-document 
corpus have been realized, and the personalized QAAS system obtains 
the best performances. 

Keywords: Automatic Summarization, Question- Answering systems, Text re- 
trieval, Vector Space Model 

1 Introduction 

Automatic summarization is indispensable to cope with ever increasing volumes 
of valuable information. An abstract is by far the most concrete and most rec- 
ognized kind of text condensation [TJ. We adopted a simpler method, usually 
called extraction, that allow to generate summaries by extraction of pertinence 
sentences |2I3] . Essentially, extracting aims at producing a shorter version of 
the text by selecting the most relevant sentences of the original text, which we 



juxtapose without any modification. Linguistic methods, notably semantic anal- 
ysis, are relevant, but their application remains difficult or limited to restricted 
domains [415] . The vector space model |6I7] has been used in information extrac- 
tion, information retrieval, question-answering, and it may also be used in text 
summarization. Furthermore, statistical, neural, SVM and connexionist methods 
are often employed in several areas of text processing |8I9I10I11I12] . Actually, the 
existing techniques only allow to produce summaries of the informative type |13| . 
Our research tries to generate this kind of summaries. CORTE^is an automatic 
summarization system, recently developed p3] which combines several statisti- 
cal methods with an optimal decision algorithm, to choose the most relevant 
sentences. 

An open domain Question- Answering system (QA) has to precisely answer a 
question expressed in natural language. QA systems are confronted with a fine 
and difficult task because they are expected to supply specific information and 
not whole documents. At present there exists a strong demand for this kind of 
text processing systems on the Internet. A QA system comprises, a priori, the 
following stages |15j : 

— Transform the questions into queries, then associate them to a set of docu- 
ments; 

— Filter and sort these documents to calculate various degrees of similarity; 

— Identify the sentences which might contain the answers, then extract text 
fragments from them which constitute the answers. In this phase an analysis 
using Named Entities (NE) is essential to find the expected answers. 

Most research efforts in summarization emphasize generic summarization 
|16I17I18] . User query terms are commonly used in information retrieval tasks. 
However, there are few papers in literature that propose to employ this approach 
in summarization systems [19I20I21] . In the systems described in [19], a learning 
approach is used (performed). A document set is used to train a classifier that 
estimates the probability that a given sentence is included in the extract. In [20 , 
several features (document title, location of a sentence in the document, cluster 
of significant words and occurrence of terms present in the query) are applied to 
score the sentences. In [3TJ learning and feature approches are combined in a two 
step system: a training system and a generator system. Score features include 
short length sentence, sentence position in the document, sentence position in 
the paragraph, and tf.idf metrics. Our generic summarization system includes a 
set of ten independent metrics combined by a Decision Algorithm. Query-based 
summaries can be generated by our system using a modification of the scoring 
method. In both cases, no training phase is necessary in our system. 

In this paper we present the coupling of an algorithm of automatic summa- 
rization with a Question- Answering system, which allows to decrease the docu- 
ment search space and to increase the number of correct answers returned by the 
system. Two scenarios have been evaluated: in the first one the summarization 
process is used as a noise filter (it condenses texts at a low compression rate), 
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and in the second one as a true summarization system (it condenses at high 
rates). In Section 2, the preprocessing technique is presented. In Section 3, the 
Cortex algorithm is described: several metrics and a Decision Algorithm (DA) 
are presented. In Section 4, we analyze the sensibility of metrics and DA. In 
Section 5, two main evaluation methods are described and applied. In Section 6 
end 7, experiments and results of applying both Cortex and QA systems are 
described. Finally, some conclusions and future work are presented. 

2 Pre-processing 

We process texts according to the vector space model [22 , a text representation 
very different from linguistic structural analysis, but which allows to efficiently 
process large volumes of documents |6I13] . Texts are represented in a vector 
space to which several classic numeric algorithms are applied. 

Filtering In a first step, the Cortex algorithm pre-processes each text in the 
corpus. The original text contains Ny/ words which can be function words 
(articles, prepositions, adjectives, adverbs), nouns and conjugated verbs, but 
also compound words which often represent a very specific concept. All these 
words may occur repeatedly. It is important to decide whether to utilize in- 
flected forms or base forms. That is why we prefer the more abstract notion 
of term instead of word [22]. To reduce the complexity of the text, various 
filters are applied to the lexicon: the (optional) deletion of function words 
and auxiliary verba^] common expressions^] text in parentheses (which often 
contains additional information which is not essential for the general com- 
prehension), numbers (numeric and/or textual)^] and symbols, such as ($), 
(#)> (*)> e ^ c - I n this stage, we employ several negative dictionaries or generic 
stoplists. 

Lemmatization and Stemming In morphologically rich languages, such as 
Romanic languages, it is essential to lemmatize the words. This consider- 
ably reduces the size of the lexicon. Simple lemmatization consists of finding 
the lemma of the conjugated verbs and replacing the plural and/or femi- 
nine words with the singular masculine form before counting the number 
of occurrences. In this task, a dictionary containing approximately 330,000 
entries was used. After lemmatization, we applied a stemming |23[24 affix 
removal algorithm (based on Porter's rules (24] ) to obtain the stem of each 
lemma. Stemming (or conflating) words allows to reduce the morphological 
variants of the words to their stem [25]. In these processes, it is assumed 
that words semantically related have the same stem. So the words chante, 

4 Our exemples and tests in this paper are all in French: etre (to be), avoir (to have), 
pouvoir (can), devoir (must)... 

5 par exemple (for example), ceci (that is), chacun/chacune (each of) ... 

6 In the case of generic abstracts, we decided to delete numbers. However, when cou- 
pled with the QA system, we do not delete them, because the answers are often of 
numerical type. 



chantaient, chante, chanteront and eventually chantewt^J are transformed to 
the same form chanter (to sing). This twofold process decreases the curse 
of dimensionality, which causes severe problems of matrix representation for 
large data volumes. Lemmatization/stemming identifies a number of terms 
which defines the dimensions of the vector space. Some additional mecha- 
nisms to decrease the size of the lexicon are also applied. One of them is 
compound words detection. Compound words are found, then transformed 
into a unique lemmatized/stemmed termP] We also investigate other meth- 
ods for lexicon reduction, for example by grouping synonyms by means of 
specialized dictionaries. 
Split sentences Given the cognitive nature of summaries, we split the texts 
into variable length segments (sentences), according to one or more suitable 
criterifj^] Fixed size segmentation was ruled out, because we want to extract 
complete sentences. Period (.), carriage return (<-^ CR), colon (:), question 
mark (?) and exclamation mark (!) (or their combinations) may be taken 
as sentence delimiters. Since electronic addresses and Internet sites (URLs) 
always contain periods, it is essential to detect and transform them in this 
phase. 

Title detection The titles (document title and section titles) found in a doc- 
ument are very informative. However, in raw texts, the title is not marked 
explicitly. Therefore, to detect it, some heuristics are needed. Conceptually, 
the title can be processed as a particular segment. A segment is declared to 
be the "main-title" following the rules below: 

— The words of the first sentence are in capital letters. 

— The first sentence. 

— The first sentence is separated from the text by a carriage return. 

— The 10 first words of a text. 

At the end of these processes, an XML file with a simple structure is obtained: 

<?xml version="1.0" encoding="UTF-8" ?> 
<Texte Langue-"Fra M Title= M This is the title"> 
<S> Text for processing </S> 

<Subtitle_l> Title of section 1 </Subtitle_l> 
<S> Text to processing </S> 
<S> Text to processing </S> 

<Subtitle_2> Title of section 2 </Subtitle_2> 

<S> More text to processing </S> 

</Text> 

After pre-processing, a text representation in Vector Space Model is con- 
structed. Then, we apply several statistical processes to score sentences. The 
summary is generated by selecting the sentences with higher scores. In Figure 

7 Respectively sing, sang, sung, (will) sing and singer. 

8 pomme de terre / pommes de terre becomes pomme de terre (potato). 

9 In the case of summarization of very long texts, it is more suitable to apply a 
segmentation by paragraph or page rather than by sentences. 
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Fig. 1. General architecture of Automatic Summarization system LZ/4-Cortex. 



[T] we present a diagram of the Cortex system, developed at Laboratoire In- 
formatique d' AvignorJ*°| (LIA) . In the next section, we present the weigthing 
sentences algorithm of Cortex. 



3 The Cortex algorithm 

In this section, the matrices, the metrics and the Decision Algorithm of the 
Cortex system will be described. After the pre-processor has filtered the text 
and lemmatized the words (to group those of the same family) the selection of 
relevant sentences can be started. For every sentence, the metrics, which are 
all based on the matrices of either presence or frequency of terms, are calcu- 



lated and combined by the Decision Algorithm described later (see Section 3.3 1. 
The sentences are then ranked according to the values obtained. Depending on 
the desired compression rate, the sorted sentences will be used to produce the 
summary. We define the following variables: 

Nw '■ Total number of different raw words. 
Ns ■ Total number of sentences. 

Nt ■ Total number of titles (document title and section titles) in the text. 
Nm ■ Number of different terms remaining after filtering. 

Nl : Size of the "relevant" lexicon, i.e. the number of words appearing at least 
twice in the text. 
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Based on the terms that remain in the text after filtering, a frequency matrix 
7 is constructed in the following way: every element 7^ of this matrix represents 
the number of occurrences of the word i in the sentence \x. 



7i 7 2 
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7fe {0,1,2,...} 



(1) 



Another matrix £, called binary virtual or presence matrix, is defined as: 



1 if 7 r^o 

elsewhere 



(2) 



Every line of these matrices represents a sentence of the text. The sentences 
are indexed by a value p varying from 1 to Ng. Every column represents a term 
of the text. Terms are indexed by a value i varying from 1 to Nl- The titles are 
stored in another matrix j T . Matrices 7 and j T are the frequency matrix of the 
sentences, and frequency matrix of the titles, respectively. 

Pre-processing phase transforms the text into a set of Ns sentences or seg- 
ments and Nl retained terms which are regarded as relevant. The relation 
Nl < Nm < Nw is always true. We define pl as: 



PL = 



N 



w 



(3) 



It is important to note that the matrices 7 and £ are very sparse because 
every line (representing a sentence) contains only a small part of the vocabulary. 
Because of this, fast matrix manipulation algorithms had to be adapted and 
implemented. We estimated pL over 20,000 text documents (from the corpus 
Le Monde that will be explained in section [7] Experiments II) . We obtained 
Pl ~ 0.52, on average. 

To obtain the final summary, the user sets the compression ratio r as a 
fraction (in percent) of the number Ns of sentences, or the number Nw of 
words. 



3.1 The metrics 

Important mathematical and statistical information can be gained from the 
"term-segment" matrices £ and 7, to be used in the condensation process. In 
our experiments, r — 10 metrics were calculated (frequencies, entropy, Ham- 
ming and hybrid) based on these matrices. The more relevant a segment is, 
the higher are the values of its metrics. Subsequently, the r metrics used are 
explained: 



1. Frequency measures. 

(a) Term Frequency F: The Term Frequency metrics [26 counts the number 
of relevant words in every sentence /j,. Thus, if a sentence contains more 
important words, it has more chances to be retained. If the sentence 
is longer, it usually includes more relevant words, thus it has a bigger 
chance to be retained. Consequently, the summaries generated based on 
this metrics (generally) contain the long sentences Pj 

Nl 

= ( 4 ) 

i=l 

Note that we can easily calculate T, the total number of terms occurring 
in the text after filtering: 

N S N L N S 

(b) Interactivity of segments /: The Cortex system exploits the existence 
of a network of words of the same family present in several sentences. 
For every distinct term in a sentence, we count the number of sentences, 
except the current sentence, containing this worcp" 2 "] Then the current 
sentence /i is said to be in interactivity with Ni sentences by the word 
i. The Ni value of all words in the sentence are added to obtain their 
weights. 

N L N s 

'"=£££ (6) 

1=1 7 = 1 

(c) Sum of probability frequencies A: This metrics balances the frequency 
of the words in the sentences according to their global frequency: 

^=E^r (7) 

With: 



i=l 



-, N S 

M =i 

The values pi are the probabilities of occurrence of term i in the text. The 
more often a word (or a family of words) occurs in a text, the greater 
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It is important to note that in our context, metrics F (and below entropy E) is useful 
only after the filtering/lemmatisation processes: the function words and words with 
F < 1 are not present in the lexicon of TV^ words. 
12 For example, if the word "aimer" ("to love") occurs twice in a sentence, that accounts 
for a single "distinct" word. For this reason, we use the matrix of presences £. 



will be its weight in the sentences. The product p^ of metrics A is 
not similar to tf.idf (Term frequency - Inverse document frequency [26 ) 
weigthing: the p, are values de probability of a term i in all segments, 
instead of inverse document frequencies, and no logarithm or square 
function is used in calculations. 

2. Entropy. The entropy E is another measure depending on the probability 
of a word in a text. If the probability pi of a word is high, then the sentences 
which contain this word may be favoured: 

N L 

S" = - J^Pilogaft (9) 

i=l 

3. Measures of Hamming. These metrics use a Hamming matrix H , a square 
matrix x N^, defined in the following way: 

Ns f ■ ~\ 

H- = y{l for me r S 2 '^ (10) 

n ^ I elsewhere J n&\l,m\ 

3=1 

The Hamming matrix is a lower triangular matrix where the index m rep- 
resents the line and the index n the column, corresponding to the index of 
words, where m > n. 

The idea is to identify the terms which are semantically connected. In this 
way, two terms which might be synonyms will have a high value in H be- 
cause we do not expect to find them in the same sentence, i.e. this matrix 
represents the number of sentences that contains only one of two words but 
not both. 

(a) Hamming distances <P: The main idea is that if two important words 
(maybe synonyms) are in the same sentence, this sentence must certainly 
be important. The importance of every pair of words directly corresponds 
to the value in the Hamming matrix .Hp*) The metrics of the Hamming 
distances is calculated as follows: 

Nl m 
m—2 n—1 

(b) Hamming weight of segments 0: The Hamming weight of segments is 
similar to the metrics of frequencies F. In fact, instead of adding the 
frequencies of a sentence, the occurrences £ are added. Thus, a sentence 
with a large vocabulary is favoured. 



Nl 



^-^ef (12) 



The sum of the Hamming distances is the most resource-intensive metrics to be 
calculated. It takes more time than all other metrics combined because its complexity 
is O(Nl). 



(c) Sum of Hamming weight of words per segment 0: This metrics closely 
resembles the metrics of interactivity I. The difference is that for every 
word present in a sentence pk, all the occurrences of this word in the text 
are counted and not only their presence in all other sentences except the 
current sentence. We thus obtain: 

e^=J2^ (is) 
»=i 

and ipi as the sum of the occurrences of every word. 

N s 

(d) Hamming weight heavy 77: Among the sentences containing the same set 
of important words, how do we know which one is the best, i.e. which one 
of these sentences is the more informative? The solution is to choose the 
one that contains the biggest part of the lexicon. Already, the metrics 
0^ is relatively sensitive to the different words in a sentence. However, 
if this metrics is again multiplied by the number of different words in a 
sentence (0 M ), we are capable to identify the most informative sentences. 

n n = ( 15 ) 

(e) Sum of Hamming weights of words by frequency Q: The sum of the 
Hamming weights of the words by frequencies uses the frequencies as 
factor instead of the presence as in the case of the metrics 0^ . The 
sentences containing the most important words several times will be 
favoured. 

Nl 

fi" = E^ ( 16 ) 

i=l 

Note that ipi has been calculated in the metrics 0, and 7^ represents 
the number of times that the term i is present in the sentence /i. 
4. Titles and subtitles. Almost all the texts have a main title. Some also 
have subtitles. So, important information can be deduced from the document 
structure. Angle between a title and a sentence 6: The purpose of this metrics 
is to favor the sentences which refer to the subject in the title. In fact, we 
compare, word by word, every sentence to the titk^| (main title or subtitle). 
To combine the comparisons, we calculate the normalized Nl dimensional 
scalar vector product between the sentence and the title vector j T , and 
finally the cosine of this value: 

(y N \ 7 ^ 7 T A 

r = COS V' = M M T n ( 17 ) 



14 The metrics will be used to get personalized abstracts (see subsection 7.4 1 



3.2 Normalization of the metrics 



Before using the r metrics in the decision algorithm, they have to be normalized. 
Therefore, every metrics is calculated for all the sentences. The value A M for a 
sentence /j, = 1, • • • , N$ is shown below: 

»V» - ^ M 

where: 

m = min {Xj for j G [1, JV5]} 
M = max {A-/ for j G [l,iVs]} 
Every metrics normalized takes values in the range [0,1]. 

3.3 Decision algorithm 

The Decision Algorithm (DA) combines all normalized metrics in a sophisticated 
way. Two averages are calculated: the positive tendency, that is A M > 0.5, and 
the negative tendency, for A M < 0.5 (the case A M = 0.5 is ignored). To calculate 
this average, we always divide by the total number of metrics r and not by the 
number of "positive" or "negative" elements (real average of the tendencies). 

So, by dividing by r, we have developed an algorithm more decisive than the 
simple average [f]and even more realistic than the real average of the tendencies. 
Here is the decision algorithm that allows to include the vote of each metrics: 

n r 

E a = E (II a mII-°- 5 ) (19) 

||a5>o. 5 

X> = E (0.5- II ||) (20) 

ll A rJl<°- 5 

v is the index of the metrics, is the sum of the absolute differences between 
||A|| and 0.5, a are the "positive" normalized metrics, the negative 

normalized metrics and F the number of metrics used. The value attributed to 
every sentence is calculated in the following way: 



then yl M = 0.5 H — — : DA is chosen in order to advantage the segment [i 

YT P 

else yl M = 0.5 — — : DA is chosen in order to disadvantage it 

r 6 
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Contrary to simple average, which may be ambiguous if the value is close to 0.5, our 
algorithm chooses to penalize the sentences with a score of exactly 0.5. 



is the value to finally decide whether or not to retain the sentence [i. In the 
end, N$ sentences are sorted according to this value A^;n — 1, ■ • • ,N$- The 
compression rate r determines the final number of sentences, which are chosen 
from the sorted list. 

4 Metrics sensibility 

We have a set of metrics and a decision algorithm that give a score for each 
sentence. However, the metrics are not equally important. What about the 
metrics capacity to discriminate the segments? Imagine the following situation: 
four metrics Ajji = 1,2,3,4 are applied to a document split into six segments: 
Sj\j = 1, • • • ,6. Metrics Ai gives the maximal value to all segments, therefore 
its mean value (Ai) = 1 and its variance a\ = 0. Metrics A2 rejects all segments, 
then (A2) = and cr 2 = 0. Metrics A3 evaluates all segments with the same 
value (A3) = 0.5, then 03 = 0. Finally, metrics A4 gives maximal value to three 
segments and to the rest: (A4) = 0.5 and its variance 04 » 0.547. Which of 
these four metrics is the best? The answer is related to the metrics capacity to 
separate the pertinent segments from the non pertinent ones. The mean value 
does not represent this measure, since Ai and A2 have the "same" constant value 
(always or always 1). None of them is discriminant, yet they have extreme 
mean values. A3 is still worse, because it is an undecided metrics: it is incapable 
to decide "yes" or "no". Finally, metrics A4 has the same average as A3 (0.5), but 
unlike the others, its variance is important. This metrics is better in separating 
the segments. So, the variance may be be used to calculate the metrics sensibility. 
Indeed we performed a statistical study to evaluate this capacity. We calculate 
the sensibility values of 20,000 documents over 1 million sentences. The re- 
sult is shown in Figure [2] on the left side. In this figure, it is clearly visible that 
all the metrics have high sensibility values, then all of them are important. This 
is a suitable property but, what about the metrics mean value? In the right side, 
we plotted the mean value of each metrics. In other words, this value represents 
the average compression rate (pi) we obtain with the metrics i. Therefore, met- 
rics angle 9, Hamming's distances I> and Hamming weight heavy 77 eliminate, in 
the average, among 80% and 90% of the text's phrases. The rest of the metrics 
eliminates 70%. The Decision Algorithm, in the average, retains close to 25% of 
sentences. This first study shows that all the metrics are discriminant and that 
they have the ability to condensate text at high rates. 

However, a finer study of the metrics and Decision Algorithm has been per- 
formed. We have considered the proportion of advantaged and disadvantaged 
segments separately. In Figure [3] we show only two metrics, the first one repre- 
senting the density picture for angle 6, and the second one, the density shape 
for interactivity I. In Figure [4] on the left, we show the density picture for the 
Decision Algorithm. It is clear that there are no undecided values (A 1 * 7^ \) 
in the Decision Algorithm, and that most of the sentences (ss 87%) have been 
disadvantaged (A^ < |). We defined the effective mean compression rates (k) + 
and (k)~ for every metrics as follows: 



N s 



E A " ( 21 ) 



cardlA' 1 > 0.5} 

L 1 (1=1 

Af>0.5 
, N s 

^ = cardlA. < 0.5} E A " (22) 

Af<0.5 

where card{»} represents the cardinality of set {•}. In Figure El on the right, 
values (k) are shown for every metrics and for the DA. In Figure^] the effective 
compression rate (k) and its corresponding ratio (in percent) of advantaged or 
disadvantaged sentences is shown. 





Fig. 2. On the left: standard deviation for each metrics. At the right: mean value of 
metrics and mean value of the Decision Algorithm. Tests over 19, 090 text documents 
that contain 916, 170 sentences have been performed. Metrics are: Frequency F , Inter- 
activity I, Sum of Hamming weight of words per segment O, Hamming distances W , 
Hamming weight of segments (f>, Hamming weight heavy TI , Sum of Hamming weights 
of words by frequency Q, Entropy E, Sum of probability frequences A and Angle 9. 



Order of presentation of segments Another important robustness test was 
performed. We mixed the sentences at random to generate a new text. This text 
was then processed by to Cortex and the same values for the Decision Algo- 
rithm were found. Our results showed that the order of presentation of sentences 
has no impact on the final decision of the DA. This can be explained because 
our algorithm does not use any position-sentence metrics. On the other hand, 
tests on Minds and Word summarizers show that these methods are sometimes 
dependent on the order of presentation of sentences. Indeed, the segmentation 
of sentences by the separator (:) tends to perturb their performance. 



Angle © Interactivity 




Sentence Sentence 



Fig. 3. Density means of decision "yes" (advantaged sentence) and "no" (disadvan- 
taged sentence) on 916,170 sentences, for Angle 6 and Interactivity I metrics. Every 
point represents the normalized value A M calculated by the metrics on sentence ft. On 
the left, we show the Angle metrics 9. Most values for this metrics are on the bottom 
(i.e. the metrics decides 0.00, so it strongly disadvantages many sentences) and they 
are not visible because they are mapped to horizontal axis. Then a sparse density is 
found. On the right, the interactivity metrics I is shown. Most values are under 0.5, 
but the I density is more uniform than 0, i.e. this metrics is less decisive than 6 (see 
FigureWAfor more details). 
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Fig. 4. On the left: "density" of the decision algorithm is plotted over 916,170 sen- 
tences. On the right we show mean values of Decision Algorithm for "yes" (advantage 
sentence) and "no" (disadvantage sentence). It is visible that the Decision Algorithm 
is powerful (there are no "undecided" sentences, i.e. with A — 0.5 in a gap of « 0.1), 
and most sentences (near 87%) are disadvantaged. Effective compression rates (k) 
are calculated with equations \21\ and \22\ 
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Fig. 5. Effective compression rate {n} over 916, 170 sentences. On the left, we show 
and the percentage of sentences retained. At the right, and the percentage of 

sentences eliminated. The order of metrics on the horizontal axis is the same as that 
in Figure [5] On the left, we note that the metrics Angle 6 retains a low percentage of 
sentences (~ 10%) with high decision values (> 0.7). The observation for the same 
metrics, in the right, shows that this metrics eliminates a big percentage (~ 90% ) of 
sentences with very low values (< 0.05,). 



5 Evaluation 

The best way to evaluate automatic text summarization systems is not evident, 
and it is still an open problem |27j . In general, methods for evaluating text sum- 
marization systems can be classified into two main categories [28]. One is an 
intrinsic evaluation, where humans judge the summary quality directly. How- 
ever, this approach is very difficult to implement in the case of big corpora (for 
example, if a multi-document corpus must be summarized). Therefore extrinsic 
methods will be necessary in this situation. 

In an extrinsic evaluation, the summary quality is judged based on how it 
affects the performance of other tasks. We choose the coupling with a Question- 
Answering system to perform this evaluation. Formally, for extrinsic evaluation, 
we applied Confidence- Weighted Score (CWS) [29] to evaluate the output of the 
QA system. CWS was specifically chosen from TREC-2002 to test a system's 
ability to recognize when it has found a correct answer. The questions were 
ordered in such a way that the highest in ranking was the question for which 
the system was most confident in its response and the lowest was the question 
for which the system was least confident in its response. If two or more systems 
produce the same set of candidate answers, but in a different ordering, the system 
which assigns the highest ranking to the correct answer is regarded as the best 
one. Formally the confidence-weighted score is defined as: 



Q is the number of questions and i c the number of correct answers in the first i 
questions (position within the ordered list). The CWS criterium is used to order 
the selected candidate answers according to the score of the sentences provided 
by the personalized Cortex system in Extrinsic methods (see Section [7}. 

6 Experimental framework I. Intrinsic methods: generic 
abstracts 

In |14I30|31] we showed results of the Cortex system applied to generic sum- 
maries. Its performance is better or equal than that of other generic summa- 
rization methods. We will reproduce some results of these experiments here. 
We tested the algorithm on documents coming from various sources (scientific 
articles, excerpts of the press on the Internet). We compared our results with 
Minds[^] Copernic summarizeip] Pertinencj^] Quirclj^] Word and baseline 
systems. In order to evaluate the quality of summaries, we compared all the re- 
sults with summaries produced by 17 people (students and university professors) 
accustomed to write summaries. 

Some tests on the text "Puces" (see Annexe [A}, which is artificially ambigu- 
ous (because of its heterogeneous mixture of texts from two different authors) 
will be presented. The subject "computer chips" is in the first part (w 2/3) of the 
text, and the presence of fleas in a Swiss military companjj^Jis discussed in the 
second on^"[ Obviously, no hints of this preliminary knowledge are submitted 
to the system. This text contains Nw = 605 words. The segmentation process 
splits the text into N$ = 30 sentences. Then, filtering/lemmatization/stemming 
process returns a set of Nm = 279 terms. It contains Nl = 30 distinct terms. 
The topic of sentences to 14 is about computers chips, whereas sentences 15 
to 29 discuss fleas. An abstract of 25% of the original text size must contain 8 
sentences. We expected that the systems would produce a summary composed 
of two sub-summaries (taking into account both subjects). This result was well 
confirmed for our algorithm. 

Figure [6] shows the precision- recall plot for the "Puces" text. In this graphic, 
the Copernic and Cortex algorithms yield the best precision values for this 
task. Cortex has a value of 62.5% for precision at 100% recall. However, we 
think that the precision measure may be not sufficient to evaluate the quality of 
extracts. So, the Precision-Recall plot may be completed with an other evaluation 
measure: the quality Q. We evaluated the quality of extracts obtained for each 
method by measuring the value: 



16 http://messene.nmsu.edu/minds/SummarizerDemoMa in. html 
1 http://www.copernic.com 

18 http://www.pertinence.net 

19 htt p : / / www . mcs . surrey. ac.uk/ Syst emQ / 

20 http://www.admin.ch/cp/f/1997SeplO. 064053.8237 @idz. bfi.admin.ch.html 

21 In French, the word "puces" (fleas/chips) is ambiguous in this context. 
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Fig. 6. Precision-Recall graphic for several algorithms on text "Puces". At 60% 
recall, only CORTEX and Copernic yield 100% precision. At 100% recall, CORTEX 
shows 62.5% precision. We do not show values for the Word summarizer because 
precision and recall are both in this task. 



s 

Q= Y WO, where 9 = 

fi£ extract 

(24) 

is the mean value for segment /i in the extract compiled by human judges. 
Thus, a normalized version of Q is a kind of precision of a method. Values 
of normalized Q for the studied methods over seven French texts (3QI31J are 
plotted in Figure [7| In this graphic, Cortex obtains the best quality values. 
Other results |301I31I show that our system is noise robust, less sensitive to the 
order in with the sentences are presented and the summaries are balanced and 
mostly of a better quality. 

7 Experimental framework II. Extrinsic methods: 
coupling with the LIA-QA system 

We found it interesting to couple Cortex system with the vectorial search engine 
LIA-SIAC j32] and LIA-QA system (33] to evaluate the impact on the answers' 
precision. This might be a way to measure the quality of summaries and possibly 
to improve the QA system. Thus, two types of experiments were performed. In 
the first one, generic abstracts were generated from a multi-document corpus. 
The second experiment uses a modification of the Cortex system to generate 
personalized abstracts. In both cases, when the digests are obtained, we used 
the QA system to find exact answers to specific questions. The statistics were 
estimated over 308 questions which were automatically assigned a type and an 



J 1 if segment fi is present in the human's extract 
1 otherwise 
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Fig. 7. Measuring the quality of methods. The vertical axis measures the nor- 
malized quality Q ( calculated by equation 24 ) of extracts over seven French docu- 
ments. Our algorithm CORTEX performs better than other methods in this generic 
summarizing task. 



associated Named Entity (NE). We used the Cortex system as a noise elimina- 
tor, with a compression rate t={80%,90%}, as a true abstracting system with 
a compression rate r={10%, 20%} and finally as an intermediate system with 
a compression rate between 30% < r < 70%. Comparisons with baseline and 
random systems will also be presented. 



7.1 Corpus description 

The whole corpus contains D ss 20, 000 articles from the French newspaper Le 
Mond^\ was used in the evaluation Technolangue EVALDA/EQUER project)^] 
It covers the years 1987 to 2002 and it is a highly heterogeneous proprietary 
corpus. This corpus contains text coded in ISO-Latin, which we transformed 
into UTF-8 code before processing. A set of Q = 308 questions in French has 
been provided by the company Sinequ£p^| They correspond to the translation of 
some questions used in TREC programs (a translation of 1893 questions TREC 
may be found on RALI sit^J. Four examples of questions with their expected 
NE has been reproduced here: 

— A quelle distance de la Terre se situe la Lune ? (What is the distance from 
the Earth to the Moon?) NE: <DISTANCE> 

— En France, a combien s'eleve le pourcentage de la TVA sur les Compact 
Disc ? (What is the percentage of VAT on Compact Discs in France?) NE: 
<VALUE> 

22 www.lemonde.fr 

23 http://www.technolangue.net/article61.html 

24 

www.smequa.com 
25 www-rali.iro.umontreal.ca/LUB/qabilingue.en.html 



— Ou la Roumanie se situe-t-elle ? (Where is Romania located?) NE: <PLACE> 

- Que signifie RATP ? (What does RATP mean?) NE: <SOCIETY> 

In addition to the set of questions, Sinequa provided us for each question 
with a list of documents containing at least a common word with the question, a 
labeled version of these documents. In this list the types of named entities (nouns, 
dates, names of companies, places, durations, sizes,...) are marked, moreover the 
type of required entity, if it exists. 



7.2 Selection of candidate answers 

The LIA-QA system approach for the selection of candidate answers is used 
with the vectorial search engine LIA-SIAC [32j to localize the required entity 
type and in the exploration of knowledge bases |33134p^| Initially, the documents, 
summaries or not, are split into lexical units and sentences, labelled syntactically 
and then lemmatized. After calculating the proximities of the sentences to the 
question they are ordered according to this values (we use a weighting of the type 
tf.idf and a cosine value) q] Then, a filtering can be applied to preserve only 
the sentences containing at least a named entity corresponding to the expected 
type for the question. In a simplistic way, the response of the system could be 
the first entity of expected type appearing in the sentencj^] 



7.3 Search of answers in generic summaries 

Tests on generic summaries were realized to verify their capacity to preserve in- 
formative textual zones, that is, the zones suitable to answer to precise questions. 
Tables [T] and [2] show that compression rate is not proportional to the number of 
correct answers returned by the system. 

Cortex was applied on corpus to generate summaries at different compres- 
sion rates r = {10%, 20%, 30%, • • • , 90%}. In this stage, all the r = 10 metrics 
are used. Table [l] shows the real rates observed after creation of the summaries 
according to the number of sentences, words or characters for some compression 
rates r. 

Table [2] depicts the results obtained when the QA system was coupled to the 
generic Cortex system (generic QAAS). We expected that if the QA system 
worked on summaries instead of full text, it would reject non-informative areas 
that may generate a false hypothesis, and preserve those which will provide right 

26 It should be noted that for the set of experiments presented here, we did not exploit 
the knowledge bases usually employed by the LIA-QA system. 

27 In these experiments we have neither enriched nor modified the question, except for 
the elimination of the function words and lemmatization. 

28 This way of proceeding has at least two problems shortcomings: in filtering, while 
avoiding the sentences in a too draconian way, returns sometimes impossible the 
extraction of an answer and the selection of the "first" entity may lead to an inad- 
equate choice in the case that several entities of the same type are present in the 
same sentence. 



Expected rate r % 


Observed rate % 


Sentences 


Words 


Characters 


80 


71.1 


77.6 


77.5 


50 


45.8 


59.3 


59.2 


20 


20.6 


31.5 


31.5 



Table 1. Real compression rate of summaries in number of sentences, words 
and characters, in function of desired rate. 



answers to the questions. In addition, we also hoped to save time because of the 
reduction of the search space. The score of each sentence was calculated by the 
LIA-SIAC engine, after wich the LIA-QA system uses this score to find the 
answers. 



Compression 
Rate T % 


100 


90 


80 


70 


60 


50 


40 


30 


20 


10 


Responses 


187 


187 


187 


187 


187 


187 


187 


187 


185 


179 


Correct answers 


50 


46 


46 


42 


42 


42 


38 


38 


33 


26 


CWS 


30.89 


25.33 


25.34 


25.33 


24.46 


24.32 


23.03 


22.11 


21.78 


17.38 



Table 2. Correct answers, responses and CWS (see equation 23) found by the 
generic CORTEX system. We show that the number of responses and correct an- 
swers is slightly degraded by high compression rates (t<50%). 



7.4 Search of answers in user's query-based summaries 

We have demonstrated how coupling a question-answering system with a text 
summarization system may make the latter one more efficient by reducing the 
search space, without significantly altering the quality of results. However, the 
use of generic summaries might be limited. One may hope that a query-oriented 
summary could find the answers more efficiently, because the documents would 
be condensed in a targeted way. In this section, we explain how to adapt the 
generic summarization system to obtain a customized summarization system, 
whose behaviour is adapted to the questions submitted by the user. The person- 
alization of summaries (taking into account the user's question) would increase 
the chances of not eliminating correct answers. We have good reasons to think 
that this will improve the precision of the answers. Figure [9] shows the architec- 
ture of the LIA-QAAS (Question- Answering Automatic-Summarization) system. 

First, LIA-SIAC extracts a subset of Ftp relevant documents for each ques- 
tion from the corpus. Concurrently, the set of Q = 308 questions is filtered, 
lemmatized and stemmed. An expansion process (described in the next section) 
is applied to this question set. Thereafter a multi-document abstract at variable 
compression rates (10% < r < 90%) is obtained by Cortex. In this stage, the 



score for each sentence is local to each document. In the next step, the multi- 
document abstract are re-scored by using the Cortex system once more. The 
result consists in a set of query-personalized sentences sorted by score for each 
question. The process will be described in the next subsection. 



Adding search terms to a user's query Query expansion consists of adding 
search terms to a user's weighted search. For example, a search for car may 
be expanded into {car, cars, auto, autos, automobile, automobiles} and 

then lemmatized to {car, auto, automobile}. Query expansion was applied to 
this set by adding synonymous terms taken from a simple thesaurus. This will 
result in one or two additional terms for each term in the user's query. Query 
expansion has the disadvantage that undesirable noise may be added, but the 
purpose is to improve precision and/or recall by using a more flexible query. In 
our case the introduction of noise is minimized, because the expansion is applied 
only to the query and not to full text. 

Formally each query is represented by a vector qj, where j — 1,...,Q. For 
each document in the corpus, its main title, represented by vector j T , is sub- 
stituted by the set of every vector query qj such that its answer is likely to be 
found in the document (see equation 17 1. The metrics used are only Frequency 
F and Angle 9. This combination measures exactly the similarity between qj 
and each sentence i.e. the sentences that are closer to user's question. 

Then, a set of abstracts at variable compression rates r is generated by the 
Cortex system. At the end of this process, we obtain the most informative text 
areas for each document that match with the query. Finally, a multi-document 
abstract is generated for each question. In this stage, each sentence is locally 
ranked (the sentences came from a particular text, then ranked with sentences 
from the same text). A picture of the LIA-QAAS system (generic and personal- 
ized) is shown in Figures [8] and [9] respectively. 



Re-ranking of candidate segments At this stage, a summary has been gen- 
erated for each question in the multi-document corpus. Since each sentence's 
score is local to one document, several sentences may have the same score (for 
exemple, many sentences may have a local score decision A = 1.0, and must be 
globally re-scored to avoid decision conflicts in QA system). Another re- ranking 
process is applied to obtain a unique global score (that takes into account all the 
documents for the query) per sentence. This process returns a global score for 
each sentence that depends on degree of similarity of the query. In this phase, 
terms in the document that are not present in the vector query are filtered out. 
We obtain a new set of documents to which Cortex is re-applied with all met- 
rics. In table [3] we show the results found by the QA system coupled to the query 
guided Cortex system (personalized QAAS). 

Figures [lO]and[lT]show our results. We have compared the QAAS system with 
the generic Cortex summarization system and the Baseline system. Baseline 
tests were performed in two ways: as a random system or a baseline system (the 
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Fig. 8. The LIA Generic Question Answering- Automatic Summarization 
(QAAS) system. A user's natural language question is transformed into a query. 
Candidate documents are chosen from multi- document corpus by LIA-SIAC, and 
then pre-processed. CORTEX summarizes this multi-document subset to generate 
a generic summary. The LIA-QA system is applied to this summary to generate 
an answer to the question. 



first percent of sentences). In both cases, the score for each sentence (a value in 
[0,1]) was randomly generated. 

Figure 10 shows a comparison between baseline/random methods and person- 
alized extracts from the Cortex system. We note that personalized summaries 
are much better than other methods. However, the baseline system obtains a 
good performance at compression rates lower than 50%. This can be explained 
by the nature of the corpus. The newspaper articles are written in an "intrinsic" 
baseline form, the common style of journalists: the main information is dupli- 
cated and located at the top (first lines) of document. 

Finally, Figure 11 shows the Precision- Recall values (Correct answers, Re- 
sponses) for the random and baseline systems, the QAAS system with generic 
and customized summarization, and QA applied to the full text corpus: the pre- 
cision value for the personalized QAAS system is higher than the precision value 
obtained with full text. 



7.5 Analysis of the results 

The results obtained show that degradation is minimal between 1% and 3% 
in spite of a high compression rate, when customized summary is used. When 
full documents (without being summarized) are processed by our information 
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Fig. 9. The LIA Personalized QAAS system. A user's natural language ques- 
tion is transformed into a query. Then query expansion is applied. Candidate 
documents are chosen from multi-document corpus by LIA-SIAC, and then pre- 
processed. CORTEX summarizes this multi- document subset to generate a cus- 
tomized summary. This summary is filtered and sentences are re-ranked again 
by Cortex. Finally the LIA-QA system is applied to generate an answer to the 
question. 



• QAAS Customized Summary 
— ■— Full text (SIAC) 

A » 

QAAS Generic ■ 

r to "° O O 

w \ A 

Baseline . 

V - e — > 

Random \J\ 



100 90 80 70 60 50 40 30 20 10 

Compression rate t % 



Fig. 10. Values CWS for customized vs. generic abstracts, random and baseline 
systems. The best personalized QAAS value CWS — 30.75 (full cercle pattern) 
for t — 30% is very close to the value CWS — 30.89 obtained with full text (full 
square pattern). In the case of generic abstracts, CWS values are lower. This 
shows that query-based summaries filters and preserves the most informative 
segments of each document. 
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Fig. 11. Precision-Recall values for all systems and for full text research (from 
table pj|). For summarization systems, we fixed the compression rate r = 10% 
(left) and t — 20% (right). Systems on the top and right are better. The perfor- 
mance of the personalized QAAS system is the best, and the volume of the search 
space is less important. 



Compression 
Rate r % 


100 


90 
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186 
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Correct answers 


50 


52 


54 
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51 


53 


54 


52 


52 


CWS 


30.89 


29.69 


29.96 


28.97 


29.69 


30.00 


29.30 


30.75 


30.13 


29.66 



Table 3. Correct answers, responses and CWS (see equation 23) generated by 
the personalized CORTEX system ( QAAS). We show that the number of responses 
and correct answers is not at all correlated to the compression rates. At com- 
pression t = 30% we found four more correct answers (54) than when analysing 
full text (50) and CWS is very close (30.15 vs. 30.89). 



retrieval system, the answers (the references) are found in 209 cases out of 308p^ 
But in 48 cases, the segments associated with these answers do not correspond 
to the question (even if they contain the words of the answers, they are used 
in a context different from the context of the question). Thus, only 158 of the 
206 answers found in the documents may be considered as correct. Let us give 
in example the following question: "Who has created The New York Post?'p^| 
The correct answer Alexandre Hamilton is returned by the system, based on 
the following segment: 

"Created by the conservator Alexandre Hamilton in 1801, it remained 
faithful to the ideas of its founder but it often changed hands, particularly lately: 
briefly property of Rupert Murdoch in the Eighties, the Post was, since 1988, 
that of Mr. Peter Kalikow, inheritor of a real-estate empire ' p"{ 

Here the correct reference was indeed found, but the sentence used by the 
system to find it is considered to be insufficient at the time of the evaluation. 
The segment does mention the creation of a newspaper called Post by Alexandre 
Hamilton, but there is no evidence that this one is The New York Post. 

Here is a similar example: the correct answer Mitch to the question "What 
hurricane devastated Central America in 1998? is found by the system, but 
the justification is insufficient: Hurricane Mitch devastated Central America 
short time after Johnny "had set fire at Stade de France'p^j Since no date is 
mentioned in the passage, it cannot be considered as supporting the answer. 



29 For 20 questions there is no response. 

30 « Qui a cree le New- York Post ? » 

Cree en 1801 par le conservateur Alexandre Hamilton, il est reste fidele aux idees 
de son fondateur mais il a souvent change de mains, particulierement ces derniers 
temps : brievement propriete de Rupert Murdoch dans les annees 80, le Post etait, 
depuis 1988, celle de M. Peter Kalikow, I'heritier d'un empire immobilier. 

32 « Quel ouragan a devaste I'Amerique centrale en 1998 ? » 

33 L 'ouragan Mitch a devaste I'Amerique centrale peu de temps apres que Johnny eut 
« mis le feu au Stade de France » 



8 Conclusion 



The Cortex algorithm is a very powerful text summarization system. We mea- 
sured the quality of our summaries with intrinsic and extrinsic methods. In 
intrinsec evaluation methods, our digests have a similar or higher quality than 
other methods. Our algorithm is able to process large corpus in three language 
(English, French and Spanish). Balanced summaries are obtained, and the ma- 
jority of topics are taken into account. The Decision Algorithm, based on the 
weighting votes of metrics, is robust, convergent and independent of the order 
of segments. Two extrinsic methods were used to evaluate the quality of sum- 
maries: we coupled generic and query guided text summarization systems with 
a question-answering system. Generic summaries act as a powerful noise filter, 
but the quantity of answers found by the Question- Answering (QA) system is a 
decreasing function of the compression rate. However, with a customized sum- 
mary, where texts are filtered and condensed in a targeted way, the QA system 
performs much better. Customized summaries reduce the risk of eliminating cor- 
rect answers. Tests on the corpus he Monde showed that the Cortex algorithm 
preserves the relevant sentences, and that the QA system preserves its good 
performance, evaluated by CWS criterion. This is true even at high compres- 
sion rates (about 10%), when customized summary is used. We think that the 
number of correct answers may be increased if the system calculates the most ap- 
propriate Named Entity in the summarizing step before invoking the QA system. 
Currently we are in the process of improving our system with that feature. 
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A The text "Puces" 

Note that the sentence containing the fragment "...stationnee a Avenches, sont 
envahis paries puces... " contains the following mistake: paries (to speak) must 
be written par les (by the) . The segment ". . .Des piqures de puces ont ete relevees 
sur plus d'untiers des..." contains the following mistake: untiers, it must be 
written un tiers (a third party). However, a small quantity of noise does not to 
much affect the Cortex algorithm performance. 

Informatique et puces. 

Et si l'ordinateur pouvait fonctionner un jour, sans electricite ou presque ? La demarche 
de chercheurs americains de l'universite de Notre Dame, dans l'Indiana, montre que l'on 



peut manipuler des electrons pour construire des circuits elementaires avec des quan- 
tities d'energie infimes. Leurs experiences, relatees dans l'edition du 9 avril du magazine 
Science, ouvrent la voie a des composants capables de fonctionner a des frequences 10 
a 100 fois plus elevees que celles des puces actuelles qui sont bridees par des problemes 
de dissipation de chaleur. Les travaux de l'equipe dirigee par Greg Snider portent sur 
le puits quantique, un piege infinitesimal dans lequel un electron peut etre enferme. 
Les scientiBques ont cree des cellules carrees formees de quatre puits quantiques, dans 
laquelle ils ont introduit une paire d'electrons. Les forces de repulsion provoquent le 
deplacement des electrons qui trouvent leur equilibre lorsqu'ils se trouvent places aux 
deux extremites de l'une ou l'autre des diagonales de la cellule. La premiere represente 
l'etat 0, tandis que l'autre indique le 1: chaque cellule represente done un bit, la plus 
petite quantite d'information que l'on peut manipuler dans les ordinateurs. Tout de- 
placement d'un electron sous l'effet d'une force exterieure provoque automatiquement 
le deplacement du second electron de maniere a retrouver l'equilibre, et done le bas- 
culement de la cellule entre les etats et 1. L'utilisation d'une cellule unique ne prouve 
rien. Les chercheurs americains ont reussi a en assembler plusieurs, provoquant, suivant 
leurs besoins, le deplacement des electrons sans devoir fournir d'energie, ou presque. 
Dans les transistors actuels, le passage de l'etat a l'etat 1 n'est possible qu'au prix 
du deplacement de plusieurs milliers d'electrons, ce qui genere un important flux de 
chaleur. En regroupant cinq cellules elementaires, les chercheurs ont mis au point un 
circuit baptise "majoritaire" capable de realiser les deux fonctions logiques de base, 
ET et OU, a la demande. Ils ont ensuite verifie son bon fonctionnement et esperent as- 
sembler plusieurs de ces circuits pour effectuer des additions et des multiplications sur 
des nombres. En cas de succes, la technique des cellules logiques quantiques pourrait 
permettre d'entasser des centaines de milliards de circuits dans une seule puce elec- 
tronique. Pour l'instant, le dispositif fonctionne seulement a une temperature voisine 
du zero absolu, mais les chercheurs ne desesperent pas de parvenir a le rechauffer tout 
en maitrisant son comportement. Les cantonnements de la compagnie IV de l'ecole 
de recrues d'infanterie d'exploration et de transmission 213, stationnee a Avenches, 
sont envahis par les puces et les poux. Des piqures de puces ont ete relevees sur plus 
d'untiers des militaires. On a aussi retrouve des cadavres de poux sur 3 militaires. Des 
mesures d'urgence ont ete prises en consequence. Des piqures de puces ont ete diag- 
nostiquees sur plus d'un tiers des 155 hommes de la compagnie IV de l'ecole de recrues 
d'infanterie d'exploration et de transmission 213. Des cadavres de poux, mais aucun 
oeuf, ont egalement ete deceles sur 3 militaires. Ces insectes sont transmis par contact 
personnel. La cause de cette invasion n'est pas claire; ces insectes semblent toutefois 
avoir essaime a partir du local de garde. Le medecin de troupe a donne immediate- 
ment les soins necessaires aux militaires concernes et il a ordonne les mesures d'hygiene 
qui s'imposaient. Des produits speciaux ont ete remis pour les soins corporels. Tout le 
materiel personnel dela compagnie a ete emballe hermetiquement et apporte a l'arsenal 
cantonal de Fribourg. La troupe sera deplacee dans un complexe industriel. Une section 
d'hygiene de l'ecole de recrues d'hopital 268, stationnee a Moudon, va desinfecter tous 
ces cantonnements. On estime qu'avec ces mesures sanitaires appropriees la troupe 
pourra reintegrer ses cantonnements vendredi au plus tard. 



Generic abstract generated by CORTEX (t = 25%, in brackets the 
number of extracted sentences) 

'^La demarche de chercheurs americains de l'universite de Notre Dame, dans l'Indiana, 
montre que l'on peut manipuler des electrons pour construire des circuits elementaires 
avec des quantites d'energie infimes. ' 5 'Les forces de repulsion provoquent le deplace- 
ment des electrons qui trouvent leur equilibre lorsqu'ils se trouvent places aux deux 
extremites de l'une ou l'autre des diagonales de la cellule. ' 8 'lbut deplacement d'un elec- 
tron sous l'effet d'une force exterieure provoque automatiquement le deplacement du 
second electron de maniere a retrouver l'equilibre, et done le basculement de la cellule 
entre les etats et 1. ' 10 'Les chercheurs americains ont reussi a en assembler plusieurs, 
provoquant, suivant leurs besoins, le deplacement des electrons sans devoir fournir 
d'energie, ou presque. ' 12 'En regroupant cinq cellules elementaires, les chercheurs ont 
mis au point un circuit baptise "majoritaire" capable de realiser les deux fonctions 
logiques de base, ET et OU, a la demande. ' 14 'En cas de succes, la technique des cel- 
lules logiques quantiques pourrait permettre d'entasser des centaines de milliards de 
circuits dans une seule puce electronique. ' 16 'Les cantonnements de la compagnie IV 
de l'ecole de recrues d'infanterie d'exploration et de transmission 213, stationnee a 
Avenches, sont envahis paries puces et les poux. ' 20 'Des piqures de puces ont ete diag- 
nostiquees sur plus d'un tiers des 155 hommes de la compagnie IV de l'ecole de recrues 
d'infanterie d'exploration et de transmission 213. 
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